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Abstract 



Given a training sample of size m from a d-dimensional population, we 
wish to allocate a new observation Z 6 IR d to this population or to the noise. 
We suppose that the difference between the distribution of the population 
' and that of the noise is only in a shift, which is a sparse vector. For the Gaus- 

sian noise, fixed sample size m, and the dimension d that tends to infinity, we 
obtain the sharp classification boundary and we propose classifiers attaining 
this boundary. We also give extensions of this result to the case where the 
sample size m depends on d and satisfies the condition (log m) / log d — > 7, 
< 7 < 1, and to the case of non-Gaussian noise satisfying the Cramer 
condition. 

Keywords: Bayes risk, classification boundary, high-dimensional data, optimal 
classifier, sparse vectors 

^ ■ 1 Research partially supported by the RFBI Grant 08-01-00692-a and by Grant 

O NSh-638.2008.1. 

^ ! 2 Research partially supported by the grant ANR-07-BLAN-0234 and by PICS- 

^ i 2715. 

3 Research partially supported by the grant ANR-06-BLAN-0194, by the PAS- 
CAL Network of Excellence and Isaac Newton Institute for Mathematical Sciences 
in Cambridge (Statistical Theory and Methods for Complex, High-Dimensional 
Data Programme, 2008). 



1 Introduction 

1.1 Model and problem 

Let X = (Xi,...,X n ) and Y = (Yi, . . . ,Y m ) be two i.i.d. samples from two 
different populations with probability distributions Px and Py on IR d respectively. 
Here 

Xi = (X, 1 , . . . , X i ), Yj = (Yj , ...,Yj) 

where Xf and Y^ are the components of X{ and Yj. We consider the problem of 
discriminant analysis when the dimension of the observations d is very large (tends 
to +00). Assume that we observe a random vector Z = (Z 1 , . . . , Z d ) independent 
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of (X, Y) and we know that the distribution of Z is either Px or Py. Our aim is to 
classify Z, i.e., to decide whether Z comes from the population with distribution 
Px or from that with distribution Py. 
In this paper we assume that 

X} = v h + &, Yf = u k + V «, (1.1) 

where v = (v%, . . . ,Vd), u — (ui, . . . ,Ud) are deterministic mean vectors and the 
errors rjj, . . . , rjj are (unless other conditions are explicitly mentioned) 

jointly i.i.d. zero mean random variables with probability density / on 1R. 

Distinguishing between Px and Py presents a difficulty only when the vectors 
v and u are close to each other. A particular type of closeness for large d can be 
characterized by the sparsity assumption [91 [1] that we shall adopt in this paper. 
As in j9l [1] , we introduce the following set of sparse vectors in IR d characterized by 
a positive number ad and a sparsity index (3 G (0, 1]: 

U Ptad = ju = (iii, • • • ,u d ) : u k = a d s k , e k <E {0, 1}, cd 1 "^ < ^ e k < Cd l ~ p J . 

Here < c < C < +oo are two constants that are supposed to be fixed throughout 
the paper. The value p = d~^ can be interpreted as the "probability" of occurrence 
of non-zero components in vector u. 

In what follows we shall deal only with a special case of model fll.ip that was 
also considered recently by [3]. Namely, we assume: 

v = o, ue Uf3 :ad . 

In this paper we establish the classification boundary, i.e., we specify the necessary 
and sufficient conditions on (3 and a d such that successful classification is possi- 
ble. Let us first define the notion of successful classification. We shall need some 
notation. Let ip be a decision rule, i.e., a measurable function of X, Y, Z with 
values in [0, 1]. If if) = we allocate Z to the P^-population, whereas for if) — 1 we 
allocate Z to the Py-population. The rules if) taking intermediate values in (0, 1) 
can be interpreted as randomized decision rules. Let Pj^ and denote the joint 
probability distributions of X, Y, Z when Z ~ Px and Z ~ Py respectively, and 
let Eff^, Var^ and , Var^ denote the corresponding expectation and variance 
operators. We shall also denote by P^ the distribution of Y and by E^ u \ Var^ 
the corresponding expectation and variance operators. Consider the Bayes risk 

K B (if)) = nE^W + (1 - vr)<(l - if,), 

where < n < 1 is a prior probability of the Px-population, and the maximum 
risk 

TZ M (if)) =m«(jS?g(Vi),J5g(l-^)). 
Let TZ(if>) be either the Bayes risk 71b or the maximum risk TZm(^)- 
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We shall say that successful classification is possible if (3 and a d are such that 

lim inf sup 1Z(ip) = (1.2) 

for 7Z = IZm and 1Z = TZb with any fixed < n < 1. Conversely, we say that 
successful classification is impossible if (3 and a d are such that 

liminfinf sup TZ{ip) = TZ max , (1.3) 
where TZ max — 1/2 for 1Z — TZm and 7£ max = min(7r, 1 — tt) for TZ = IZb with 

< 7T < 1. 

We call (11.21) the upper bound of classification and (II. 3p the lower bound of 
classification. The lower bound (11.31) for the maximum risk 7Z = IZm is interpreted 
as the fact that no decision rule is better (in a minimax sense) than the simple 
random guess. For the Bayes risk TZb, the lower bound (11.31) is attained at the 
degenerate decision rule that does not depend on the observations: ^ = if 7r > 1/2 
or if) = 1 if vr < 1/2. 

The condition on ((3, a d ) corresponding to the passage from (11.21) to (II. 3p is 
called the classification boundary. We shall say that a classifier if) = if> d is asymp- 
totically optimal (or that if) attains the classification boundary) if, for all f3 and a d 
such that successful classification is possible, we have 

lim sup 7Z(if)) = (1.4) 
where 7Z = IZm or 7Z = TZb with any fixed < n < 1. 
1.2 Main results 

According to the value of f3, we shall distinguish between moderately sparse vec- 
tors and highly sparse vectors. This division depends on the relation between m 
and d. For m not too large, i.e., when logm = o(log<i), moderately sparse vec- 
tors correspond to f3 G (0,1/2] and highly sparse vectors to (3 G (1/2,1). For 
large m, i.e., when logm ~ 7log<i, 7 G (0,1), moderate sparsity corresponds to 
f3 G (0, (1 - j)/2) and high sparsity to (3 G ((1 - 7)/2, 1 - 7). 

The classification boundary for moderately sparse vectors is obtained in a rel- 
atively simple way (cf. Section 2). It is of the form 

R d = d 1/2 ^a d ^l. (1.5) 

This means that successful classification is possible if R d — > +00, and it is impos- 
sible if R d — > as d —>■ +00. The result is valid both for f3 G (0, 1/2] and m > 1 
fixed or for m depending on d such that logm ~ 7log<i, 7 G (0, 1) as d — » +00 and 
(3 G (0, (1 — 7)/2]. Moreover, (jl.5p holds under weak assumptions on the noise. In 
particular, for the upper bound of classification we only need to assume that the 
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noise has mean zero and finite second moment (cf. Section 2). The lower bound is 
proved under a mild regularity condition on the density / of the noise. 

The case of highly sparse vectors is more involved. We establish the classifica- 
tion boundary for the following scenarios: 

(A) m > 1 is a fixed integer, and the noise density / is Gaussian Af(0, a 2 ) with 
known or unknown a > 0; 

(B) m — > +oo as d — > +oo, logm = o(logd), and / is Gaussian Af(0, a 2 ) with 
known or unknown o > 0. 

(C) logm ~ 7logd, 7 G (0, 1), and / is Gaussian jV(0, a 2 ) with known or un- 
known a > 0. 

The upper bounds are extended to the following additional scenario: 

(D) m — > +oo as d — > +oo, logm ~ 7 logrf, < 7 < 1, m/ \ogd — > +00, and the 
noise satisfies the Cramer condition. 

The conditions on the noise in (A)-(D) are crucial and, as we shall see later, they 
suggest that a special dependence of on d and m of the form ad x ^/ (log d)/m 
is meaningful in the highly sparse case. More specifically, we take 
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sa ^/ log d, X\ = s\fm + 1, (1.6) 



where x 1 > is fixed. The classification boundary in (A, B, D) is then expressed 
by the following condition on (3, s and m: 

xi = m (1.7) 

where 

(MP) if 3/4 < /3 < 1, ' 

with 

MP) = VW Z7 l, 2 (/3) = v / 2(l-v / T 3 ^)- (1-9) 

In other words, successful classification is possible if x 1 > <j>(@) + 5, and it is 
impossible if x\ < <f>(/3) — S, for any 5 > and d large enough. This classification 
boundary is also extended to the case where x\ depends on d but stays bounded. 

For Scenario (C) let = ax a/ (log d)/m with fixed x > 0. We show that in 
this framework successful classification is impossible if f3 > 1 — 7 (cf. 1° in Section 
2), and therefore we are interested in j3 G ((1 — 7)/2, 1 — 7). Set [3* — (3/{l — 7) G 
(1/2, 1) and af* = #/ ^/l — 7. Then the classification boundary is of the form 

for the function 0(/3) defined above. 
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Note that if / is known, the distribution Px is also known. This means that 
we do not need the sample X to construct decision rules. Thus, in Scenarios (A), 
(B) and (C) when a is known we can suppose w.l.o.g. that only the sample Y is 
available; this remark remains valid in the case of unknown a, as we shall see it 
later. As to Scenario (D), we shall also treat it under the assumption that only the 
sample Y is available (w.l.o.g. if / is known), to be consistent with other results. 
However, if / is not known, the sample X contains additional information which 
can be used. The results for this case under Scenario (D) are similar to those that 
we obtain below but they are left beyond the scope of the paper. 

For m = (i.e., when there is no sample Y) the problem that we consider here 
reduces to the problem of signal detection in growing dimension d, cf. [6j [7J El [9j 
[TUt [TJ [12], and our classification boundary coincides with the detection boundary 
established in [Sj. Sharp asymptotics in the detection problem was studied in [6] 
(see also [9], Chapter 8) for known ad or (3. Adaptive problem (this corresponds 
to unknown and (3) was studied in [TJ [8]. Various procedures attaining the 
detection boundary were proposed in PUJCIJCI2]- Ingster and Suslina [10J introduced 
a method attaining the detection boundary based on the combination of three 
different procedures for the zones (3 G (0,1/2], (3 G (1/2,3/4] and (3 G (3/4,1). 
Later Donoho and Jin [1] showed that a test based on the higher criticism statistic 
attains the detection boundary simultaneously for these zones. More recently Jager 
and Wellner [12] proved that the same is true for a large class of statistics including 
the higher criticism statistic. 

The paper of Hall et al. [4] deals with the same classification model as the one 
we consider here but study a problem which is different from ours. They analyse the 
conditions under which some simple (for example, minimum distance) classifiers ip 
satisfy 

lim E^\ip) = 0. (1.10) 

Hall et al. [4] conclude that for minimum distance classifiers fll.lOp holds if and 
only ifO < (3 < 1/2. This implies that such classifiers cannot be optimal for 
1/2 < /3 < 1 . They also derive (11.101) for some other classifiers in the case m — 1 . 

The results of this paper and their extensions to the multi-class setting were 
summarized in [14] and presented at the Meeting "Rencontres de Statistique 
Mathematique" (Luminy, December 16-21, 2008) and at the Oberwolfach meeting 
"Sparse Recovery Problems in High Dimensions: Statistical Inference and Learning 
Theory" (March 15-21, 2009). In a work parallel to ours, Donoho and Jin [2j [3] 
and Jin [TT] independently and contemporaneously have analysed a setting less 
general than the present one. They did not consider a minimax framework, but 
rather demonstrated that the higher criticism (HC) methodology can be success- 
fully extended to the classification problem. Donoho and Jin [3] showed that, for a 
special case of Scenario (B), the "ideal" HC statistic attains the same upper bound 
of classification that we prove below. Together with our lower bound, this implies 
that the "ideal" HC statistic is asymptotically optimal, in the sense defined above, 
for the Scenario (B). Donoho and Jin announce that similar results for the HC 
statistic in Scenarios (A) and (C) will appear in their work in preparation. 
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This paper is organized as follows. Section 2 contains some preliminary re- 
marks. In Section 3 we present the classification boundary and asymptotically 
optimal classifier for moderately sparse vectors under rather general conditions on 
the noise. In Section 4 we give the classification boundary and asymptotically op- 
timal classifiers for highly sparse vectors under Scenarios (A), (B) and (C). Section 
5 provides an extension to Scenario (D). Proofs of the lower and upper bounds of 
classification are given in Sections 6 and 7, respectively. 

2 Preliminary remarks 

In this section we collect some basic remarks on the problem assuming that / is the 
standard Gaussian density. As a starting point, we discuss some natural limitations 
for ad- 

1°. Remark that cannot be too small. Indeed, assume that instead of the set 
Up,a d we have only one vector u = (a^i, • • • , dd^d) with known £ fc £ {0, 1}. Then 
we get a familiar problem of classification with two given Gaussian populations. 
The notion of classification boundary can be defined here in the same terms as 
above, and the explicit form of the boundary can be derived from the standard 
textbook results. It is expressed through the behavior of Q\ = a\ Ylt=i e k'- 

• if Qd — ^ 0, then successful classification is impossible: 

liminf inf TZ(ij)) = TZ max , 

d— >+oo ip 

• if — ^ +oo, then successful classification is realized by the maximum like- 
lihood classifier ip* = I{t*>o} where 

d 

T*= ( zk - a ci/2). 

k=l:e k =l 

Here and below !{.} denotes the indicator function. 

If we assume that Y2t=i £ k x d 1-13 , we immediately obtain some consequences 
for our model defined in Section 1. We see that successful classification in that 
model is impossible if is so small that d x ~^a\ = o(l), and it makes sense to 
consider only such ad that 

S-^al -> +oo. (2.1) 

In particular, for 7 > 1 — (3 successful classification is impossible under Scenario 
(C) with a d x a/ (log d) /m. 

We shall see later that (12. 1ft is a rough condition, which is necessary but not 
sufficient for successful classification in the model of Section 1. For example, in 
that model with (3 £ (0,1/2] and fixed m, the value should be substantially 
larger than given by the condition d x ~^a 2 d x 1, cf. (II. 5p . 
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2°. Our second remark is that, on the other extreme, for sufficiently large 
the problem is trivial. Specifically, non-trivial results can be expected only under 
the condition 

x = a d ^m/ \ogd < 2V2, (2.2) 

Indeed, assume that 

x > 2V2. (2.3) 

Then the problem becomes simple in the sense that successful classification is easily 
realisable under (12.31) and the classical condition (12. ip . Indeed, take an analog of 
the statistic T* where ad and Ek are replaced by their natural estimators: 



" / SY k \ 1 m 

T = 22\Z k - — = j i k , with SY k = — = Yf, e k = l {SY k>^2Wd}, 
k=i v V / V i=1 

(2.4) 

and consider the classifier ip = 1{t>o}- We can write SY k = Ek\ + (k where 
(k are independent standard normal random variables. It is well known that 
max/ c= i i ... i d \(k\ < V2 logd with probability tending to 1 as d — > +00. This and 
(12. 31) imply that, with probability tending to 1, the vector (ei, ...,£<*) recovers 
exactly (ei, . . . , Ed) and the statistic T coincides with 

k=l:e k =l V V 7 

Since E^ u \SY k /^) = e k a d and Vax {u) (SY k / y/m) = l/m, we find: 

k=l k=l 

Varg(f)=Varg(f)=(l + i L)x:. fc . 

^ J k=i 

It follows from Chebyshev's inequality that under (12. ip we have 

E ( h ] M) = P Ho( T > 0) - 0, <(1 - VO = Pj?(T < 0) - (2.5) 

as g? — > +00. Note that this argument is applicable in the general model of Section 
1 (since the convergence in (12. 5p is uniform in u G t^g,a d ), implying successful 
classification by ip under conditions (12.11) and (I2.3p . 

3°. Let us now discuss a connection between conditions (12. ip and (12. 3p . First, 
(12.31) implies (12. ip if m is not too large: 

m = o{d l ~Hogd) . (2.6) 

On the other hand, if m is very large: 

3 6>0: m > bd}~ p \ogd, (2.7) 
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then we have x 2 > ba^d 1 13 , and condition (12 .ip implies (12.31) . Thus, the relation 

d^a 2 , x 1 

determines the classification boundary in the general model of Section 1 if m is 
very large (satisfies (12.71) ). 

4°. Finally, note that we can control conditions (12.31) and (I2.2p by their data- 
driven counterparts. In fact, max/ c=1 v < \/2\ogd with probability tending 

to 1 as d — > +oo. Hence, if ( 12 .2p holds, then My = maxi< fc < rf SY k < 3^2 logrf 
with the same probability. It is therefore convenient to consider the following pre- 
classifier taking values in {0,1, ND} (ND means "No Decision", i.e., we need to 
switch to some other classifier): 

{0 if T < 0, My > 3 v / 2Togd, 
1 if T > 0, My > 3v/2bg~d, 
ND if My < 3v / 2bgrf, 

where T is given by (12.41) . The argument in 2° implies that ip pre classifies success- 
fully if ND is not chosen. Under condition (12.21) the pre-classifier chooses ND with 
probability tending to 1 and then we apply one of the classifiers suggested below 
in this paper. We prove their optimality under assumption ( 12.21) . 

The above remarks can be easily extended to the case of Gaussian errors with 
known variance a 2 > by using the normalization Z k /a, SY k /a. Moreover, they 
extend to the case of non-Gaussian errors under the Cramer condition and the 
additional assumption m/logrf — > +oo (cf. Section E]). 



3 Classification boundary for moderately sparse 
vectors 

In this section we consider the case of moderately sparse vectors. To simplify the 
notation, we set without loss of generality a — 1. Assume that Rd = c? 1 / 2-19 ^ 
satisfies: 

lim Rd = +oo (3.1) 
and consider the classifier based on a linear statistic: 

k=l \ i=l J 

Note that T' is similar to the statistic T defined in (12 .4p with the difference that 
in T' we do not threshold to estimate the positions of non-zero Ek- Indeed, here 
we do not necessarily assume ( 12.3p . and thus there is no guarantee that Ek can be 
correctly recovered. 

Assume that r] k and £f for all k,j, i are random variables with zero mean and 
variance 1 (we do not suppose here that r] k have the same distribution as 
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Then the means of Yf and Z k are E^(Y k ) = E$(Z k ) = e k a d , E^ o iZ k ) = 0, their 
variances are equal to 1, and we have: 

k=l k=l 



Varg(T')=Varg(T') = rf(l + ^). 



We consider now a vector «6l of the form 

d 

u = (m, . . . ,u d ) : u k = a d e k , e k e{0,l}, ^e k > cd 1 ' 13 . (3.2) 

k=l 

By (I3.2p . Chebyshev's inequality and (13.11) . we obtain 

< W = P { n ] (T > 0) < (V - < (TO > cd^a d /2 

id ( 1 , 



{cd 1 -f 5 a d ) 2 \ 4m 

as <i — > +oo. An analogous argument yields that E^il — ip) — > 0. The convergence 
here is uniform in -u satisfying (13.21) . and thus uniform in a 6 Up jad . Therefore, we 
have the following result. 

Theorem 3.1 Let ryf and £f for all k,j, i be random variables with zero mean and 
variance 1. If Ii3. 1\) holds, then successful classification is possible and it is realized 
by the classifier tp hn . 

Remark 3.1 We have proved theorem 13.11 with the set of vectors u defined by 
(13.21) . which is larger than Up >ad . The upper bound on J2 k e k in the definition of 
Up >ad is not needed. Also the r\ k need not have the same distribution as the £f and 
their variances need not be equal to 1. It is easy to see that the result of theorem 13. II 
remains valid if these random variables have unknown variances uniformly bounded 
by an (unknown) constant. 

The corresponding lower bound is given in the next theorem. For a > 0, t G M, 

set 

Ut) = f{t - a)/ fit), D a = [ e 2 a it)f(t)dt, 



and 

D d im,a,f3)=d 1 ^D™iD a -l). 
Theorem 3.2 Let either m > 1 be fixed or m = m d — > +oo. // 



lim D d (m,a d ,{3)=0, (3.3) 

d— »+oo 



then successful classification is impossible. 
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Proof of theorem 13.21 is given in Section [HI 



Corollary 3.1 Let f be the density of standard normal distribution. If 

lim R d = 0, (3.4) 

d— >+oo 

then successful classification is impossible for (3 G (0, 1/2] and m fixed or for (3 G 
(0, 1/2) and m = m d — > +oo such that m = 0(d l ~ 213 ). 

Proof. For the standard normal errors we have D a = e a . Therefore, condition 
( 13.31) can be satisfied only if ma d = o(l) as d — > +oo. Moreover, in this case 

D d (m, a d , 13) x d^a^l + ma 2 d ) x R 2 d . (3.5) 

Thus, if ma d = o(l), conditions (13.31) and (13.41) are equivalent. Now, (13.41) and the 
assumption (3 G (0,1/2] imply a d = o(l). This proves the corollary for fixed m. 
Also, if (3 G (0, 1/2) and m = m d — > +oo such that m = O^d 1 " 213 ), then ma 2 d = 
0(R 2 ) = o(l). □ 

Remark 3.2 Relation (13.51) is valid for a larger class of noise distributions, e.g., 
for non-Gaussian noise with finite Fisher information. Indeed, assume that £ a (t) is 
-^2(/)-differentiable at point a = 0, i.e., there exists a function £'(■) such that 

\\l a {.)-l-at (-)H/ = o(a), 0<r(.)||/<+oo, (3.6) 

where ||<?(-)||/ = ^9 2 { x ) f( x ) dx. Observe that 

ll'(OII? = i (J 7^f dx ='^ 

is the Fisher information of / (with /' defined in a somewhat stronger sense than, 
for instance, in [5]. Under assumption (13. 6p we have 

D a = 1 + ||4(-) -i\\ 2 f , ||4(0 - 111/ = a 2 (I(f) + o(l)) 

as a — > 0. 

Combining remarks 13.11 and 13.21 with theorems 13.11 and 13.21 we see that relation 
(11.51) determines the classification boundary for (3 G (0,1/2] and fixed m or for 
(3 G (0,1/2) and m — > +oo, m = ©(c? 1-2 ' 3 ), if the errors have zero mean, finite 
variance and finite Fisher information. 

As corollaries of theorems 13 .11 and 13. 21 we can establish classification boundaries 
for particular choices of a d . Recall that non-trivial results can be expected only if 
a d satisfies f !2.2j) . For instance, consider a d = d~ s with some s > 0. Then for fixed 
m the classification boundary in the region f3 G (0, 1/2] is given by s = /3 — 1/2, i.e., 
successful classification is possible if s < 1/2 — j3, and is impossible if s > 1/2 — /3. 
Other choices of a d appear to be less interesting when (3 G (0, 1/2]. For example, 
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in the next section we consider the sequence a d = say (log d)/m with some s > 0. 
If a d is chosen in this way, successful classification is possible for all (3 G (0, 1/2] 
with no exception, so that there is no classification boundary in this range of (3. 

Finally, note that theorem 13.11 is valid for all (3 G (0, 1). However, for (3 > 1/2 
its assumption lim d ^ +00 R d = +oo guaranteeing successful classification is much 
too restrictive as compared to the correct classification boundary that we shall 
derive in the next section. The lower bound of theorem 13.21 is also valid for all 
(3 G (0, 1). However, we shall see in the next section that it is not tight for highly 
sparse vectors when /3 > 3/4 (cf. proof of theorem 14. ip . 

4 Classification boundary for highly sparse vec- 
tors 

We now analyse the case of highly sparse vectors, i.e., we suppose that (3 G (1/2, 1) 
if logm = o(logcf), and (3* = [3/(1 - 7) G (1/2,1) if log m ~ 7logd, 7 G (0,1). 
We shall show that the classification boundary for this case is expressed in terms 
of the function 




MP) if 1/2 < P < 3/4, 
MP) if 3/4 </?<!, 



where the functions <p% and (f>2 are defined in (11.91) . Note that <pi and $2 are 
monotone increasing on (1/2, 1), satisfy <fix(P) < 4>2{P) for all (3 G (1/2, 1), and the 
equality MP) = MP)(= l /V2) holds if and only if (3 = 3/4. 
The following notation will be useful in the sequel: 

T d = 0og~cZ> s = s d = a d /aT d , (4.1) 

and 

x = s\frn, Xq = sm/y/m + 1, x\ = sy/m + 1, x* = . (4-2) 

1-7 

Clearly, xq < x < x\. We allow s,x,xq,x± to depend on d but do not indicate 
this dependence in the notation for the sake of brevity. We shall also suppose 
throughout that (12. 2p holds, so that X\ = 0(1) as d — > +00. 

4.1 Lower bound 

The next theorem gives a lower bound of classification for highly sparse vectors. 

Theorem 4.1 Let the noise density f be Gaussian jV(0,a 2 ), a 2 > 0. Assume 
that [3 G (1/2,1) and lim sup d ^ +oc x\ < 4>{P)- Then successful classification is 
impossible for fixed m and for m = m d — > +00. 

Proof of theorem 14.11 is given in Section 

Though theorem 14.11 is valid with no restriction on m, it does not provide a 
correct classification boundary if m is large, i.e., logm ~ 7logrf, 7 G (0, 1), as in 
Scenarios (C) and (D). The correct lower bound for large m is given in the next 
theorem. 



11 



Theorem 4.2 Consider Scenario (C) with (3* = /?/(! — 7) G (1/2, 1) and 



ad = o~x \J (log d)/m. 

Assume that lim sup d ^ +OC) x* < 4>{(3*). Then successful classification is impossible. 

Proof of theorem 14.21 is given in Sectional 

Recall that, by an elementary argument, under Scenario (C) and for as in 
theorem I4.2[ successful classification is impossible if (5 > 1 — 7 (cf. remark after 
(12.11) ). This is the reason why in theorem 14.21 we consider only (5 < 1 — 7. 

4.2 Upper bounds for fixed m 

We now propose optimal classifiers attaining the lower bound of theorem 14. II under 
Scenario (A). First, we consider a procedure that attains the classification boundary 
only for (3 G [3/4, 1) but has a simple structure. Introduce the statistics 

M = max SY k , M = max SZ k 

Kk<d Kk<d 



where 



Define 



SY k = ^ V Y k , SZ k = - \Z k + V Y k I . (4.3) 



M 

Am 



max(\/2 crTrf, M ) 
Taking a small Cq > 0, consider the classifier of the form: 

„j,max it 

W - - li {A A />l+co}- 



Theorem 4.3 Consider Scenario (A). Let (3 G (0,1) and i\2.ty) hold. Then, for 
any c > ; 

lim sup E { x\i) max ) = 0. (4.4) 

If \im sup d _ +OQ Xi < 4>2(/3); then, for any c > 0, 

lim sup (ip max ) = 0. (4.5) 

//liminf rf ^ +00 > (j>2{(3) , then there exists c > such that 

lim sup E { x\l -ifj max ) = 0. (4.6) 
d^+co ueu p , a , 1 
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Proof of theorem 14.31 is given in Section 

Theorems O and EU (cf. flHD and gU) and the fact that (j>(J3) = fatf) for 
(3 G [3/4, 1) ) imply that ip max attains the classification boundary for (3 G [3/4, 1). 
On the other hand, (jSJ) implies that for (3 G (1/2,3/4) (where (b(f3) = ^((3) < 
02 (/?)) the classifier if) max does not do the correct job. Its maximal risk TZm is 
asymptotically 1, which is larger than the risk 1/2 of the simple random guess. We 
therefore introduce another classifier that has, however, a more involved structure. 
Consider the statistics 

Lo(t) = J2(l {SYh>taTd} -$(-tT d )), A (t)= y=^= , 
t=i ^d${-tT d ) 

L(t) = J2^> t ,r d} -H-tT d )), A(t)= J^— 

where t G 1R, $ is the standard normal cumulative distribution function and the 
statistics SY k , SZ k are defined in (14. 3p . Consider the grid 

ti = lh, l = l,...,N, t N = V2a, (4.7) 

with a step h > depending on d and such that h = o(l), T d h — > +oo. This 
implies that 1 <C iV <C T d as d — > +oo (here and below w d <C u> d for u d > and 
w d > depending on d means that lim d ^ +oc v d /w d = 0). Set 

A = max A (ti), A = max A(*j), A* = A , 
1</<AT i<z<at id + A 

where H = H d is such that 

d bh <^H < d B (4.8) 

for any d? > 0, 6 > and any d > d (B, b) where do(B, h) is a constant depending 
only on and 6 (such an H can be always determined depending on the choice of 
h). Consider now the classifier of the form 

C = H{A*>H}- 

Theorem 4.4 Consider Scenario (A) with (3 G (1/2, 1) and assume K2.2\) . Then 

hm sup E${P m )=0. (4.9) 

7/liminf d ^ +00 Xi > 0(/3) and lim sup d ^ +00 x < v^; ^en 

hm sup E%>(l-^) = 0. (4.10) 

Proof of theorem 14.41 is given in Section [3 

Theorems 14.11 14.31 and 14.41 show that the classification boundary for highly 
sparse vectors (i.e., for f3 G (1/2,1)) is given by (11.71) . Furthermore, the classifier 
^)* m is optimal (attains the classification boundary) for (3 G (1/2, 1), except for the 
case limsup^.,.^ xq > V2, which is already covered by the classifier ifj max . Indeed, 
x > implies that x x > y/2(l + l/m) > <p 2 (J3) for all (3 G (1/2, 1). 
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4.3 Upper bounds for m — > +00, logm = o(logcZ) 

In this subsection we analyse Scenario (B). Then m = — > +00, logm = o(logd) 
as d — > +00 and the classifier is not, in general, optimal. Nevertheless, we pro- 
pose another classifier ip^, which attains essentially the same classification bound- 
ary as in Subsection 14.21 above. Introduce the statistics 



A(t) = . 1 Z k li SY k >t(7TA , A = max Aft 

U Vy/d$(-tr d ) ^ {SY>t ^ 1<1<N ^ 



where the maximum is taken over the grid (14.71) . Here and below we use the 
same notation A(i), A as previously for different ratio statistics, since it causes no 
ambiguity. Set also 

d 
k=l 



and define 



A So = ^3^757 lC = %s.>20> (4.11) 



where H satisfies ( 14. 8ft . 

Theorem 4.5 Consider Scenario (B). Let (3 G (1/2, 1) and Ze£ ( f^.^j) ZioZd. T/ien 



lim sup Sg(^)=0. (4.12) 



//liminf^+oo 2 > </>(/?), i/ien 

lim sup Sg(l-V^) = 0. (4.13) 

Proof of theorem 14.51 is given in Section [7J 
4.4 Upper bound for Scenario (C) 

We now suggest an asymptotically optimal classifier for Scenario (C). For it > we 
introduce the statistics 

L x {t) = 1 ^2z k l {SY k >(TtTd} , L°(t) = ^2 %{SY*>>aar d }> =- 

k=l k=l 

Take a grid t\, . . . , t/v of the form (14.71) and define the classifier 
ipoo = I{A>4AT|, where A = max ACtA. 

1 s KKN 



ay/N* + L°(ty 
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Theorem 4.6 Consider Scenario (C) with a d = o~x\J (log d)/m. Let (3* = (3/(1 — 
7) G (1/2, 1) and let (CUP hold. Then 

lim sup EP (^ oo )= 0. (4.14) 
d-++oc uei/^ 

lim sup Eg(l-Voo) = 0. (4.15) 

d^+oo ueu 0>ad 

Proof of theorem 14.61 is given in Section [71 



5 Extensions 

5.1 Unknown variances 

The classifiers proposed in the previous section can be easily extended to the model 
with unknown variance a 2 , so that the results of theorems 14.31 14.41 14.51 and 14.61 
remain valid. We present here the general lines of such a modification without 
going into the details of the proofs that do not differ much from those in Section [7J 
First, note that there exists an estimator a d satisfying 

a 2 = a 2 + Vd , (5.1) 
where r] d — > in P^-probability, and 

a\ = a 2 + 0(d^a 2 d ) + (l + d- l3/2 a d ) 1 / 2 r l , d , (5.2) 

where t]' d — > in P^-probability, uniformly in u G Up >ad , as d — > +oo. 
For example, we can take the standard sample variance 

k=l 

Assume that rj* are i.i.d. jV(0, a 2 ) random variables with unknown a. Then fl5.ll) 
and (15.21) are satisfied. In fact, it is easy to see that 

k=l 

and analogously 

Varg W) = ^ Varg(^) = \ (2a" + 0{d^a 2 d )) = o(l + d^a 2 d ) 

as d — > +oo. Applying Chebyshev's inequality, we get ( 15.11) and ( 15.21) . We also 
note that these relations hold under much weaker assumptions than the normality 



15 



of rjj. It suffices to have, for example, independent random variables rjj such that 
E(rg) = 0, E[(rf) 2 } = a 2 and max,-, k E[(rf) 4 } < +00. 

We now discuss how to modify the proposed classifiers using a d - For ip pre and 
ip max y we replace the unknown a in their definitions by &d and change y/2\ogd into 
y/b log d, b > 2 for ip pre . If R d = 0(1) (which is the case for highly sparse vectors 
under (14. ip ). then d~Pa d = o(l) and (15.11) implies that the ratio a d /cr is close to 
1 in pffl -probability as well. Therefore, for the study of the variance modified 
versions of classifiers ^pP re ^ max ) we can use not only (15.11) but also the fact that 
b\ = a 2 + fjd where fjd — > in P^-probability. Thus, the desired upper bounds for 
these classifiers follow in an easy way from the results in Section 14.21 

For the classifier vp^, we replace the statistics L (t), L(t), A (t), A(t) by 

d 

Lo(t) = y^(I{syfc>fT d | 

k=l 
d 

L{t) = ^2(l {SZ k >tTd} 
fe=l 

and we take a grid 

ti = lh, I = l,...,N,t N = V2a d + 0(h), 

with step h as in (14 .7p . The cardinality N of the grid thus becomes a random 
variable. However, the relation N = O(T^) holds true in probability under (15. ip . 
Note that the modified statistics Ao(t) and A(t) contain the additional factor 
Ait) = a/ &i—tTd/cr)/&i—tTd/<Jd) as compared to the original ones. If (15.11) holds, 
these factors are (in probability) of the form exp(o(Td)) uniformly in t = 0(1). 
Under Pjj, s = 0, 1, the expectations of the summands with Ek = in Lo(t) and 
Lit) vanish. The other elements of the proof for the modified statistics are similar 
to those in Section [JJ 

For the classifier t/£o, we replace the statistics A(t) and A by 

1 d 
cr d y/d<S>i-tTd/a d ) ~ 

with the same grid as above, and 

d 
k=l 

We make similar modifications for the classifier ip^. The arguments above are 
enough for the proof of Sections 17.31 17.41 to hold through. 

5.2 Non- Gaussian noise 

We now discuss an extension of our results to Scenario (D). The remarks on the 
pre-classifier ip pre in Section [2] and the proofs of the upper bounds in Section [7J are 



I{Z*>tT d }), A (t) 

!{zfc>tr d }), A(t) = 



L (t) 



y/d${-tT d /a d y 
Lit) 

y/d$(-tT d /a d y 
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only based on the constraint (I2.2p and the following property of the tails of the 
Gaussian distribution: 

t 2 

log P(S( m > at) ~ --, t e [U , U x ] for U -> +00 and U x = 0(T d ). (5.3) 

Here S( m = ^= Y^hLi C«> an d C« are i-i-d- A/"(0, cr 2 ) random variables. 

Indeed, in Subsection 17.41 we can write $(— tT^) as P(SC m > crtT^). From (15.31) 
we deduce 

P(SC m > tT d /<r) = A d cT (i + )2/2 , t+ = max(0, t), 

where A d satisfies (17.41) for t + = 0(1). This is exactly the relation (17.3)) . which is 
also the only property of the noise distribution needed for the proofs in Subsection 
El 

If m is large enough, relation (15.31) holds not only for the Gaussian Q. It suffices 
to have the i.i.d. Q with EQ = 0, E(( 2 ) = a 2 > satisfying the Cramer condition: 

3 ho > : E (e hQ ) < +00, V/iG (-ho, ho). 

and m 3> logd In fact, using theorem 5.23 in [13J we get that, under the Cramer 
condition and for t = o(y/m), 

PISC > at) = . H ) exp ( JLa (-^) ) {l + O (*±1) } , 

where X(t) is the Cramer series. Inserting here the expression for the Cramer series 
and the relation log $(— t) = —t 2 /2 — logt + 0(1) as t — > +00, we obtain 

*m-><rt) = -£(i + o(-y + (i))~-£ 

as t — > +00, t = o(y/m). These remarks allow us to follow the proof of theorem 
14.61 in Section [7] leading to the next result. 

Theorem 5.1 Consider Scenario (D) with ad = ax (log d)/m. Let f3* = 6/(1 - 
7) E (1/2, 1) and let $£E) hold. Then 

lim sup EP o (i; 00 )=0. (5.4) 

//liminf d ^ +00 x* > <j)((3*), then 

lim sup Sg. ) (l-Voo) = 0. (5.5) 
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5.3 Adaptive procedures 



We have proposed several classifiers, which attain the classification boundary under 
various conditions on m, ad, (3. In order to obtain an adaptive procedure that attains 
this boundary simultaneously for several domains of m, ad, (3, it suffices to combine 
the classifiers in the following way. We start with the pre- classifier ip pre . If it 
outputs "No Decision", then we combine the classifiers ip hn , ip max and ^)* m using 
the Bonferroni device, i.e., our classifier will be max(i/) im ,^ mm ,^). This means 
that we allocate Z to the Py-population iff it is allocated to Py by at least one of 
the three classifiers. Analogously, if m — > +oo, then we classify by max(?/) lm ,^) 
or by ma.x(ifj hn ,ifj 00 ). 



6 Proof of the lower bounds 

In this section we prove theorems 13.21 14.11 and 14.21 Without loss of generality 
we consider only the case 7£ max = 1/2 (cf. (11.31) ). Observe that if a probability 
measure /i d on Wl d is such that 

AUp,o) = l + o(l) (6.1) 

as d — > +oo, then 



sup K m (tP) > max( f E^]{^)^ d (du), f ^(l-^rf^+ofl) 

> \{J EPM)Adu) + j < } (1 - iP)Adu)) + o(l) 

= \ J U + (1 - dP Ho + o(l) (6.2) 

where Pn a , s = 0, 1, are the "posterior" probability measures defined by 

P Hs (A) = j P$(A)Adu) 

for any Borel set A of (]R d ) m x ]R d . In view of f)6.2p , if the likelihood ratio 

dP Hl 



L(Y,Z) 
satisfies 



dP Ho 



L(Y, Z)^l in P Ho -probability (6.3) 

as d — > +oo, then the left-hand side of (11. 3ft is greater than or equal to 1/2. This 
immediately entails (11.31) because the risk of the simple random guess classifier 
equals 1/2. 

Since E Ho (L(Y, Z) - l) 2 = E Ho L 2 (Y, Z) - 1, relation (Q holds if 

limsupE Hf) L 2 (Y,Z) = 1. (6.4) 

d— >+oo 
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Based on these remarks, the proofs of theorems 13.21 and 14.11 will proceed by con- 
structing a prior measure /i d satisfying (16. ip and proving (16.41) . 

In this section we assume without loss of generality that o = 1 and that the 
constants c, C in the definition of Up >ad are such that c < 1 < C. 

The prior measure that we choose here is of the form fi d (du) = Ylk=i M^fc) 
where \i = (1 — p)5o + po~a d , P — d -13 , and 5 t is the Dirac mass at point t G IR. 
In other words, the prior measure corresponds to Uk = a d e k with i.i.d Bernoulli 
random entries e k that take value 1 with probability p = dr^ and value with 
probability 1 — d~@. 

Lemma 6.1 Let 0<c<l<C< +oo. Then the prior measure fi d defined above 
satisfies h6. 

Proof of Lemma 16.11 Set G(u) = Ylt=i u k- We have to check that 
fi d {G{u) > a d Cd l ~P) -> 0, fi d {G{u) < a d cd l ~?) -> 0. 

Since 

= a d dp = a d d l ~P, Var ti d(G(u)) = a d dp(l — p) ~ a^ 1_/3 V (3 G (0, 1), 
it follows from Chebyshev's inequality that 

AG(u) > a d Cd^) < dl _ * ->0, 
AG(u)<a d cd^) < dl ^ { l_ c)2 ^0 

as d — > +oo. □ 

It remains now to prove that (16.41) holds under the assumptions of theorems 13.21 
and EJ 

We shall need some notation. For a G 1R define the probability densities 

m 

fa(Y k )=\[f(Y t k -a), f a (Y k ,Z k ) = f a (Y k )f(Z k -a). (6.5) 
i=i 

Let P = Ylt=i ^o,fc be the probability measure that corresponds to the pure noise. 
Here the measure P ,fc has the density f (Y k ,Z k ) = f (Y k )f(Z k ) and E 0jk (■) de- 
notes the expectation under P ^- 

Next, write P# s = Ylk=iPH 3 ,k, s = 0,1, where the probability measures 
-Pff s ,fc s = 0, 1, have the densities 

f Hs ,k(Y k ,Z k ) = (l-p)f (Y k )f(Z k )+pf ad (Y k )f(Z k - sa d ). 

We denote by En a ,k, s = 0,1, the corresponding expectations. The measures P# s 
have the following densities : 

d 



f Hs (Y,Z) = l[f Hs , k (Y k ,Z k 
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The likelihood ratio is of the form 



HP 

HY,Z) = ^ = l[L k (Y k ,Z k ) 
dPH ^ 



where 

rk ^ (1 - p) + PHY k , Z k ) 1 + p(L(Y k , Z k ) - 1) 



L k (Y\Z h 



(l-p)+pL(Y k ) l+p(L(Y k )-l) ' 

and we set 

m 

L(Y k ) = l[t ad (Yf), L(Y k ,Z k ) = L(Y k )£ ad (Z k ) 
i=i 

where £ ad (t) = f(t — ad)/ fit). It will be convenient to write L k in the form 

r rv*7*^ -1+A A _ P L (Y k )(ia d (Z k ) - 1) 

L k (Y ,Z)-1+A k , A k - 1+p{L{Yk) _ 1) ■ (6-6) 

6.1 Proof of theorem [372] 



Recall that = 1 + p(L(Y k ) - 1). Since E Ho , k (A k ) = 0, we obtain 



E Ho (L 2 (Y,Z)) = fl(l + E Hoik Al)<exp[j2E Ho>k A 

k=l 

= exp I ^ E o,k f A 
\ft=i ^ 



1 

/, :] \jfc=l 

k~ 



< exp ( ^— Eo,kL 2 (Y k ) E 0!k (£ ad (Z k ) - 1 
\ P k=i 
'd P 2 DZ(D ad -l 



where 



Da d = / il d (t)f(t)dt= I I ^J d ' dt. 



Since p = d~P — > 0, relation (16. 4ft holds if 



/ 2 (t-a d ) 
/(*) 



D d (m,a d ,P) = d x ~ 2fi D^D ad - 1) -> 0. (6i 
This completes the proof of theorem 13.21 



6.2 Proof of theorem 14.11 

Assume w.l.o.g. that a = 1. Then / is the standard normal density, and thus 

2 

D a = e a . We shall assume that x\ is fixed; the general case can be treated in a 
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similar way by passing to subsequences xx )d x i > 0. By (11.61) . the condition 
( 16.81) takes the form 

d 1 - 213 exp((m + l)a 2 d ) = d l ~^ +x * -> 0, (6.9) 

In other terms, the proof of theorem 13.21 implies that successful classification is 
impossible if 

x?-2/3 + l<0. (6.10) 

This bound applies for any (3 G (1/2, 1), and it yields the result of theorem 14.11 for 
(3 G (1/2, 3/4]. It remains to show that a bound better than (16.101) can be obtained 
for P G (3/4, 1), namely 

x\ - 2(l - y/l -p) 2 < 0. (6.11) 

In order to prove this, set 

m 

SY k = J2 Y j k > SZ k = SY k + Z\ k = l,...,d, T l)d =y/2lfo£d, 
3=1 

and introduce the events 

•AsY,k = {SY k < T md }, Asz,k = {SZ k < T m+ljrf }, 

d d 

Asy = Pi AsY,k, -Asz = Pi Asz,k- 

k=l k=l 

Observe that since 

PoASYk > T m>d ) = P , k (SZ k > T m+1>d ) = $ (-a/2 log d) = o^ 1 ), 
we have 

Po(Asy) -> 1, Po(Asz) -> 1. 

Moreover 

PH ,k{SY k > T md ) = PH lt k{SY k > T mi d) 

= (1 - p)P , k (SY k > T m>d ) + pP 0tk (SY k > T m>d - ma d ), 

pP 0)k {SY k > T m4 - ma d ) = d _/3 $ (a d y/m - a/2 log d) 

< d-?$ (a d Vm + 1 - a/2 log d) x -^L== = o^ 1 ), 



where # = (a/2 -xi) /2 > 1 in view of (16.111) . Analogously, we have for s — 0, 1 

PH s ,k{SZ k >T m+ i d ) = (1 — p)P 0>k {SZ k > T m+ i d) 

+pP 0) ifc(5'Z fe > T m+M - (m + s)a d ), 
pP 0t k(SZ k > T m+ljd - ma d ) < pP 0ik (SZ k > T m+hd - (m + l)a d ) 

= d" 13 ® (a d y/m + 1 - a/2 logd} 
d~ 9 

x ^ = = o(rf~ 1 ). 
Vlog d 



21 



Thus, 

PhMsy) - 1, PhMsz) - 1, s = 0, 1, (6.12) 
as d -> +oo. Set L fc (Y fe , Z fc ) = L fc (Y fc , Z k )l {AsYk nAsz fc} , A fc = 

Ak%{A sY£r\A s Z , k }, where A fc is defined by (HEHD, and L(Y,Z) = H k=1 L k (Y, Z). 
Using f)6.12p we get that the main term in (16. 2p satisfies 

j (i; + (l-^)L(Y,Z))dP Ho = J (^ + {l-^)L{Y,Z))dP Ho 

= f (i> + (l-i/>)L(Y,Z)}dP Ho + o(l) 

as d — > +oo. Repeating the argument after (16.21) we see that to prove the theorem 
it suffices to show that 

L(Y,Z)-»1 in P^ -probability. (6.13) 

Using (I6.12p we obtain that E Ho L(Y , Z) = PrA-Asy fl Asz) - ► 1- Therefore, to 
show (I6.13P it suffices to prove that (cf. (16.41) ): 

limsup E Ho L 2 (Y, Z) = 1. (6.14) 
We now prove (I6.14p . First note that, as follows from the displays preceding ( 16. 12ft . 

EH ,k(Ak) = PHi,k{<AsY,k H Asz,k) ~ Pu ,k{-AsY,k H AsZ.fc) = o(d ), 

and 

0<L k (Y k ,Z k )<l + A k . 
Therefore, arguing as in (16. 7p we obtain 

d d 

E Ho (L\Y,Z)) = ]jE Ho 4L 2 k (Y k ,Z k ))<l[(l + E Hotk Al + 2E Hotk A k 

k=l fc=l 



(d d \ 

^EH Olfe A2 + 2^^ , fc A fc 
fe=i fc=i / 

= exp I ^ £ ,fc f A 
\/c=i ^ 



< exp ( JL- Y,Eo,k [L 2 (Y k )(£ ad (Z k ) - l) 2 l {AsYh nw ] + o{\) 

dp 2 A 
1 — p 



V 

r it=i 



exp ( ^ + o(l)l. (0.10) 
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where A = E 0>1 [L 2 (Y 1 )(^ £5d (Z 1 ) - l) 2 n { ^ in ^ z , l} ] • Observe that 

A<B + C, B = E Q , 1 (L 2 (Y 1 )£ 2 ad (Z l )l {AsZA} ), C = E 0A (L 2 (Y l )l {AsYl} ) . 
Setting bi = a d Vl with I = m or m + 1, T d = a/2 logd, we can write 

V27T J-oo 

and analogously, 

C = e 6 -$(T,-26 m ). 
Recall that we consider (3 G (3/4, 1) under assumption (16.111) . Thus, 



1/2 < 2(3 - 1 < (m + l)s 2 < 2(1 - a/1 - /?) 2 . (6.16) 

Next, bv flL6|) . 

-T d + 26 m+1 = a/2 log d (a/2 sVm+ 1 - l) = a/2 log d (v^i - l' 
Thus, for 1 / a/2 < 07 < a/2 we have 

^-2/3+2^1-^2 ^2(1-/3)- (v^-zi) 2 



dp 2 B = dp 2 e b ^<S>(T d -2b m+1 ) 



\/\ogd \/\ogd 



Here the exponent is 2(1 — (3) — (a/2 — 37) 2 < in view of the last inequality in 
(I6.16p . Therefore dp 2 B = o(l) as d — > +oo. In order to control dp 2 C observe that 
the function b i-> e fe2 $(T-26) is increasing; for 6 large enough and T > b. Therefore 
C < B for d large enough and dp 2 C = o(l) as well. Thus dp 2 A = o(l) as d — > +oo, 
and (16.14p follows. This completes the proof of theorem 14.11 

6.3 Proof of theorem 14.21 

Assume w.l.o.g. that a — 1. By assumptions of the theorem, logm ~ 7logd, 7 G 
(0, 1) and 

/?G ((l- 7 )/2,l- 7 ), a = a d = a;A/log(d)/m, x = 0(1). (6.17) 
In view of the first two lines of (16. 7p . it suffices to show that 



k=l 

Set 

A vi 



Y,E Ho ,Al = dE Hi) ,A\ = o(l). (6.18) 
pZ/Y 1 ) 



l-p + pL(Y l ) 
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and observe that 

A Y i < (1 -p)- 1 min(l,pL(Y 1 )). (6.19) 

Next, by definition, 

L(Y X ) = exp(-ma 2 /2 + V^iaSY 1 ) = d'* 2 l 2+xSYl / , 

m 

SY 1 t m -V 2 J2 Y l- 



i=i 



Take a threshold = ty/\og d such that pLi(H^) = 1, i.e., 

-(3 - x 2 /2 + xt = 0, t = x/2 + (3/x. 

Then pL(Y l ) < 1 (respectively, pLiY 1 ) > 1) is equivalent to SY 1 < (respec- 
tively, SY 1 > H,). 

Since Z 1 , Y^ are independent and, by the condition limsup d ^ +00 x* < <fi(f3*), the 
values are bounded uniformly in d we have Eh 1 (i^iZ 1 ) — l) 2 = e a <* — 1 < cqO 2 , 
where Co is a constant. Therefore, using (16.191) we find 

E Hoi Al = E H(hl (£ ad (Z l )-l) 2 E P A 2 Yl <c a 2 d E P A 2 Yl 

< ^- (p 2 E P (L 2 (Y 1 )I{ pi(Y i)< 1 }) + P(pL(Y 1 ) > 1)) , 

where P = (l—p)Po +pP a , a = a-d for brevity, P a is the Gaussian measure with the 
density /„(•), cf. (163|1 . Note that SY 1 ~ Af (0, 1) under P and SY 1 ~ jV(0, v^a) 
under P a . Therefore 

£ (L 2 (Y 1 )I {?3L(Y1) < 1} ) = £ Po (L 2 (Y 1 ) l{ P i(Yi)<i}) +pPp a (L 2 (Y 1 )I {pL(Y1) < 1} ) , 
P(pL(Y 1 )>l) = (l-p)P (SY 1 > H*)+pP a (SY l > H*) 
= (l-p) c + A(l-p), 

where 

c = = AT i2/2 , A = p$(v^a - #*) = AtT^*-*^ 2 , 

and A is a logarithmic factor: bilogd)^ 1 ^ 2 < A < B(\ogd) 1 ^ 2 for some positive 
constants b, B. It is easy to see that c < A and c = A\ as \/mfl < H*. 
Since L(Y X ) = ^(Y 1 ) we get 

E Po [L 2 (Y 1 ) 1{ p l( Y i)<i}) = -7= / exp(-ma 2 + 2a»)cte = e ma2 $(#, - 2^a), 
and 

p£ Pa (L 2 (Y 1 )]I{ pi ( Y i)< 1 }) =pE Po (L 3 (Y 1 )E{ pL ( Y i)< 1 }) < E Po (L 2 (Y 1 )]I{ pi ( Y i)< 1 }) . 
Therefore 

.9 2a 2 

/?„.. . A 2 < 

1 — p 
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- 2g(l + o(l)) 
E Ho ,Ai < ; i u + A), 



where 



u = p 



It is easily seen that u = 0(A) for if* < ^/ma, i.e., for t < x, which is equivalent 
to x 2 > 2(3. Also A = 0{u) for > 2 v /ma, i.e., for t > 2x, which is equivalent 
to x 2 < 2/3/3. If ^/ma < < 2y / ma, i.e., if x < t < 2x, then u = AX = Ac; cf. 
[9], pp. 295-296. The conditions x < t < 2x are equivalent to 2/5/3 < x 2 < 2(3. 
Therefore we get 

(-(3 x 2 > 2(3, 

dE Hoi A 2 = Ad Ud , v d = -7 + 1 + I -t 2 /2 2(3/3 < x 2 < 2/5, (6.20) 

{-2p + x 2 0<x 2 < 2(3/3. 

Thus the relation (16.181) holds true as 

liminf u d < 0. (6.21) 

Set 

(3* = /5/(l- 7 ), x* =x/ v / l-7, t* =x*/2 + (3*/x*. (6.22) 
Then the condition (16.211) is equivalent to liminf d^+oo v& < where 

-(3* as (x*) 2 > 2/5*, 

//, = /',//( 1 - - ) - I : { -{t*f/2 as 2(3/3 < (x*) 2 < 2/5*, (6.23) 

-2/5* + (x*) 2 as < (x*) 2 < 2/5*/3. 

The relations (16.231) imply that successful classification is impossible as 
limsup d ^ +00 x* - </>(/?*) < where </>(/?*) is defined by flTB) for (3* G (1/2, 1). 



7 Proof of the upper bounds 

In this section we prove theorems 14.31 - 14.61 Without loss of generality, we shall 
assume throughout that a = 1 . We shall consider that s is fixed in theorems 14.31 
and 14.41 and that x is fixed in theorems 14.51 14.61 The general case can be treated in 
a similar way by passing to subsequences Sd — > s > 0, x^ ^ x > 0. Sometimes we 
shall set for brevity (and without loss of generality) c = 1 or C = 1 where c and C 
are the constants in the definition of Up tad . 

7.1 Proof of theorem [473] 

Note first that, for any 5 > 0, uniformly in u G Up,a d i 

P (u \\M -h(x)^[d\ >5) ->0, (7.1) 
pg(\M-h(x s )y / k^d\>5)^0, s = 0,l, (7.2) 
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as d — > +00, where P^> denotes the distribution of Y, the notation x,x ,Xi is 
defined in g^D and h{t) = max( v / 2,t + - 0) ). Indeed, setting T(x) = 

h(x)y/]ogd > y/2\ogd, for any 5 > we obtain 



d 



P (u) (M > T{x) + 5) < J2 p(u) ( SYk > T ( x ) + 6 ) ^ d$(-T(x) - 5) 

k=l 

d 

+ J2 £ ^( a dVm-T(x) -5) 



k=l 



< o(l) + Cd 1 - p ^{-^2{\ - (3) hgd -5) = o(l] 
as d — > +00. Next, 

d 

pM{M < T(x) - 5) = Y[(l- P {u \SY k >T{x) -5)) 

k=l 

< exp f - P {u \SY k > T(x) - 5) 



k=l 



and 

d 

£ P {u \SY k > T{x) -8)>{d- Cd 1 -^{~T{x) +5)+ cd l -^{a^l - T(x) + 5). 



k=l 



If h(x) = V2, then (d - Cd l - (3 )^(-T(x) + 5) tends to +00 as d -> +00. If 
h(x) > y/2, then 



cd^^iaVm - T{x) + 5) = cd l -^{-^2{\ - (3) \ogd + 5) -> +00 

as <i — > +00. This proves (17. II) . The proof of (17. 2p is analogous. 

It follows from fl7.ip - fl7.2l) that if xi < (p2(P) (which is the same as h( Xl ) = y/2, 
implying h(x) = h(x ) = V%), then Am < 1 + 5, for any 5 > with both pj^ 
and Pg^ probabilities tending to 1 as d — > +00. Next, let x\ > faift)- Then 

h(xi) > h(x) > h(xo). This yields that Am < 1 + 5 for any 5 > with pj$ 
probability tending to 1 as d — > +00. Therefore, (I4.4p and (14. 5 p follow. We finally 
prove (14.61) . Using (17.11) and (17. 2p we get that, with pj^ probability tending to 1 
as d — > +00, 

/i(xi)y4ogrf-(5 hjxi) 

AM ~Kxwm+sW) {) ~ 

for any < 5 < 1, where the last inequality is satisfied for any d > 2. Then (14.61) 
holds, since we can always choose a small Cq in the definition of ijj max and a small 
5 such that 

m (1 - s) - s>1+c *- 

Finally, note that all the bounds on the probabilities above are independent of u 
and thus the convergence of the probabilities is uniform in u G Up :ad and in (a^, (3) 
such that h(xi)/h(x) is bounded away from 1. This completes the proof. 
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7.2 Proof of theorem 14.41 

Fix u G Up Ad . We first analyse the expectations and the variances of the statistics 
Lo(t) and L{t). Recall that &(z) x e~ z ^ 2 /z, z — > +00, which implies 

$(-tT d ) = A d d- {t+)2/2 , t+ = max(t, 0). (7.3) 

Here A d is a positive factor satisfying A d = 0(1) and A d 1 = 0(\/\ogd) for t = 0(1) 
and d — > +00. In this proof and the proof of theorem 14 . 51 below we assume a weaker 
condition: A d is a quantity depending on d (maybe different on different occasions) 
such that 

I log Ail =o(logd) (7.4) 

as ci — > +00. Arguing under this weaker condition will allow us to get the proof of 
theorem 15.11 in parallel with that of theorem 14.51 

The expectations of Lo(t) and L(t) for any fixed u E Up t(ld and h < t < \/2 
satisfy 

E^L (t) = d l -P{${a d ^i-tT d )-<$>(-tT d )) 
= Ad 1 '? (V«'-^) 2 / 2 _ d -*/A ) 

E$L(t) = d l -P{<b{{m + s)a d /^Tl-tT d )-<b(-tT d )) 
= A d d^ (d-i^M 2 / 2 - d"' 2 / 2 ) , 

where s — 0, 1 (recall that h <U < v2 for all in the considered grid). Note that 
if x > b for some constant b > 0, then in view of our assumptions on h we have 
d~ i2 / 2 < cT / 2 /2 for i > /i and all d large enough. Therefore, for x, x s > b and 
all d large enough, 

E^A (t) = Ad 1 ' 2 -^ 2 /^-^' 2 , 
E$A(t) = A d d l l 2 - p d t2 ^~^V 2 } 

where s = 0, 1. Since the maximum of t 2 /4 — ((£ — x) + ) 2 /2 in < t < \/2 is 
attained either at t = 2x when < x < 1/a/2, or at t = V2 when x > l/y/2, we 
have, for x > b and all d large enough, 

pWa m /^/ 2 -^ 2 / 2 , x<l/V2, 
max £/ v ; A (t) = < 1 a { c /n \2 /o , /- U-5) 

fc<t<VI V ^ |A (i d 1 -^-«^ 2 -^+) / 2 , X > 1/^2. 

Analogously, we have for s = 0, 1, x s > b and all d large enough: 

max E'->A(t) - K"*' *• £ (7 6) 

We shall need the exact asymptotics (17.51) and (17.61) only when x > b and x s > b 
for some constants b > 0. For small x and x s it will be enough for our purposes to 
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use the fact that the right-hand sides of (17.51) and (17. 6p constitute upper bounds 
for the corresponding left-hand sides for all x, x s > 0. 

We now consider bounds for the corresponding variances: 

Var^(Lo(0) < d$(tT d )$(-tT d ) + d l ~^{tT d - a d ^i)<$>(-tT d + a d Jm) 

Varg(L(t)) < d$(tT d )$(-tT d ) 

+ d l ~^(tT d - (m + s)a d /y/m + l)$(-tT d + (m + s)a d /Vm + T) 

where s = 0, 1. Since for x > the maximum of t 2 — (t — x) 2 in < t < \f2 is 
attained at t = \/2, 

Var^(A (t)) < A d (l + d-^-it-^)^ < + d i-p-«5-*f ^ 

A d , x < MP), 

Add 1 -?-^-^ 2 / 2 , x > MP), 

Varg(A(t)) < A d (l + d-^ t2 -^ x ^/ 2 )<A d (l + d 1 -^ 2 -^ 2 / 2 ) 

A d , x s < MP), 

A d d 1 -?-^-^ 2 / 2 , x s >MP), 



< 



< 



where s — 0, 1. Take iVo > such that Nq x T,j » iV. By Chebyshev's inequality, 
for each / = 1,...,N, each u G Up Ad and s = 0, 1, with Pj^-probability greater 
than 1 — 1/Nq we have 



E^AoM-No max V Var S A oW < < E^M^) + N o ™ax JVaxgA (t) 

0<t<-y/2 0<i<-\/2 v 

and these inequalities also valid for A(-) instead of A (-). All these inequal- 
ities (with A(-) and Ao(-)) simultaneously hold with probability greater than 
1 - 2iYV 2 iV -> 1 (uniformly in u G Uj3 yad ). On this event of high probability 
we can evaluate Ao and A by taking the maxima of the expectations and compar- 
ing them with the maxima of the square root of the variances. Proceeding in this 
way and using the bounds obtained above we find: 

f 0(A d ), x < MP), (3<3/4otx< MP), P > 3/4, 

A = I rfV2-/3+- 2 /2+o(h) ) 1/v /2 > X >MP), P< 3/4, 

(y-/3-((V2-*) + ) 2 /2+ow x > 1/v ^ ; p < 3/ 4 or x > p > 3/4 

with P ^-probability tending to 1 as d — > +oo, and, for s = 0, 1 : 

( 0(A d ), x s < MP), P < 3/4 or x s < MP), P > 3/4, 

A = I t /i/2-^+*2/a+o(ft) j t /^2 > x s > /3 < 3/4, 

[^i-/M(V2-x s ) + )2/2+o(h) ; Xs > p < 3/4 or ^ > ^ > 3/4 
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with Ph -probability tending to 1 as d — > +00 (the convergence of all the proba- 
bilities is uniform in u G Up tCLd ). Using these relations we get the following results. 
First, A* = o{H) with Pj^-probability tending to 1. Next, A* = o(H) with P^- 
probability tending to 1 (for s = 0, 1) if either x\ < <fii({3), (3 < 3/4 or x\ < 
02 (3 > 3/4. Furthermore, if xq < v2 — r for some small r > 0, and either 
xi > MP) + r, (3 < 3/4 or xi > MP) + r, (3 > 3/4, then with pjjg -probability 
tending to 1 we have A* > d CT 3> H for some c > 0. Clearly, the convergence of all 
the probabilities here is uniform in u G Up Ai . Thus, the theorem follows. 

7.3 Proof of theorem 14.51 

Fix u G Up >ad . Let m = m d — > +00 such that logm = o(logrf). Observe that 
ad cannot be "too large" in view of (12. 2p . Also a d cannot be "too small" since 

x = cid^/m/ logd > (f>{(3) > b for some b > 0. In particular, add 6 — > +00, for any 
5 > 0, so that ad satisfies a condition similar to (17.41) : 



We first analyse the statistic A(t). Clearly, E%A(t) = 0, since E%Z k = and 
Z k and SY k are independent. We also have E^ (Z k ) 2 = 1. Recalling (17.31) . we 



loga d | = o(logd). 



(7.7) 



obtain 




£>o 2 M) 



A 



max Var^A(t) = 1 + ^-/M^-oO+JVa 



0<t<v / 2 



JO(I), _ x<MP), 



Here and below is a factor satisfying (17.41) . Next, 



d 



-l/2-/3+t 2 /4-((t-x) + ) 2 /2 



y/d$(-tT d ) 



k=l:e k =l 



which yields 
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Analogously, 



d d 



VargA(t) = ^t^t( E $ ("^) + E 

^ d ' fc=l:e fc =0 fc=l:e fc =l 

+a2$(-(t-x)T,)$((t-x)T,))^ 

= l + A d {l + a 2 d )d^ +t2 ' 2 -^ x ^ 2 /\ 
D{{x, f3) = max Var^A(t) = 1 + A d (l + a 2) d i-/3-((V2-s) + ) 2 /2 

0<t<V2 1 



AJl+a 2 



0(1), x<fa(p\ 
d) \ di-/J-((V5-*)+) a /2 > x>0 2 (/3). 



Suppose that, for some small r > 0, 

/3 G [1/2 + r, 1 — r], ac > + r. (7.8) 

These relations and the inequality 1/2 — f3 + x 2 /2 > 1 — (3 — [\p2 — x) 2 /2 imply that 
under (17. 8p and (17. 7p we have, for some T\ > 0, r 2 > depending on r in (17.81) . 

Ds(x,0) < d- Tl E(x,{3), s = 0,l and E(x, (3) > d T ' 2 . 

Arguing as in the proof of theorem 14.41 above we obtain the following facts. First, 

|A| < A d D (x,/3) (7.9) 

with P^-probability tending to 1 as d — > +oo. Second, if x < (j)(/3), then 

|A| < AaDxfaP) (7.10) 

with P^-probability tending to 1 as d — > +oo. Finally, if (17.81) holds, then 

A>A d E(x,(3) (7.11) 

with Pj^ -probability tending to 1 as d — > +oo. Thus, with Pj^ -probability tending 
to 1, the ratio 

A(x,(3) = A = 

y/H + D*(x,0) 

is small. The same holds with -probability tending to 1 if x < 4>((3). To finish 
the proof, we show that these properties hold also for which differs from A(x,f3) 
only in that we replace Dq(x,P) by A* (note that A(x,@) is not a statistic, since 
D (x,f3) depends on the unknown parameters x, (3). The distribution of A* is the 
same under Pj^ and P^ , and depends only on the parameter u. We have 

d d 

EM(AJ = E H~^T d )+ E H~(V2-x)T d ) 

k=l:e k =0 k=l:e k =l 

= o(l) + A d d x -^ 2 -^V 2 , 
Var^(A*) < £^(A*). 
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These inequalities yield that H + A* = H + A^D^x, @) with probability tending to 
1, and the statistic has the properties that we have proved for A(x, (3). Finally, 
note that the convergence of all the probabilities in the above argument is uniform 
in u G U/3 t a d . Thus, the theorem follows. 



7.4 Proof of theorem 14.61 

For the statistics L l (t) we have 

E$L\t) = 0, E{t)=E^L l {t) = a d d l -^(a d V^ - tT d ) = Aa d d^~^V 2 - 

Var^L 1 ^) = d(l - pM-tT d ) + dp<$>{a d ^ - tT d ), 
VargL 1 ^) = d{l - P M-tT d ) + dp{\ + a 2 d )${a d ^R - tT d ), 
which yields 

Var^L^t) < 2R(t), Var^L 1 ^) < 3R(t) 

with 

R(t) = max (d${-tT d ), dp<5>{a d yM ~ tT d )) . 

Thus, for all I = 1, . . . , N, with Pj^-probability tending to 1 the statistics L l {ti) be- 
long to the intervals \—Ny/2R(ti), +Ny/2R(tj)] and with -probability tending 
to 1 they belong to the intervals [E(ti) - Ny/3R(ti), E(t t ) + Ny/3R(ti)]. 

Consider the ratios A(t t ), I = l,...,N. First, let R(ti) < AN 2 . Then for all 
I — 1, N, with P^-probability tending to 1 (s = 0, 1), we have the inequalities 

N 2 <N 2 + L (t t ) <N 2 + 2R(t t ) + Ny/R(ti) < UN 2 , 
L^ti) < N v / 2R(t l ) < 3N 2 for 8 = 0, 
L l (U) > E(U) - Nyj2R{U) > E(U) - 3N 2 for s = 1. 

Therefore, we get for all I = 1, ...,N such that R(ti) < AN 2 , with P^-probability 
tending to 1, 

A(ti) < AN for s = 0, 

A(t/) > E(t l )/AN-N>E(t l )/(2^R(t^)-N for a = 1. 
Next, let R(ti) > AN 2 . Then analogously, with Pj^-probability tending to 1, 

N 2 + L\t{) <N 2 + 2R(t l ) + iVVW) < 3i2(* { ), 
N 2 + L%) > R(ti) - Ny/mtfj > R(ti)/2, 
L 1 ^) < N^/2R{t{) for s = 0, 
L\ti) > E{ti) - Ny/2R{U) for s = 1. 
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Hence, we get for all I = 1,...,N such that R(ti) > 4iV 2 , with Pj^-probability 
tending to 1, 

A(t,) < AN for s = 0, 

A(t,) > E{t l )/{2y/Wt{j)-N fors = l. 

Thus uniformly over u G f^a,a d) 

<Voc = PS(A > 4iV) - 0. 
Recalling ( 16.171) . (16.221) let us show that under the condition 

x* > <f>(J?) (7.12) 

we have, for some rj > 0, 

max E(t{\lJR(ti) > dP. (7.13) 

l<Z<iV 

This implies that uniformly over u G U/3 >ad , 

<(l-CJ=^?(A<4A0-0. 

In order to verify (17.131) . let us study the ratio E(t)/ \/R(t). We have, with a 
logarithmic factor A, 



Ad s{t \ s(t) = -7/2 + 1/2-/3 + -mm(t 2 /2-(t -x) 2 + , (3 - (t - x)\/2). 



E{t) 

y/m "~ ' " w "" ' " 1 2 

Set to = x/2 + P/x. Let us check that 

} -(3/2, if x >t , 

s* = max s(t) > - J + - + < -tg/4, if x < t < 2ar, (7.14) 



o<t<\/2 2 2 



-/? + x 2 /2, if 2x < t - 



Indeed, the relation x > t is equivalent to x 2 > 2/5. So, s* > s(y/2]3) = —7/2 + 
(1 — (3)/2, which implies the first relation (17.141) . 

The relation 2x < t is equivalent to x 2 < 2(3/3 and if 2x < y/2, then s* > 
s(2x) = —7/2 + 1/2 — (3 + x 2 /2, which implies the third relation (17. 14ft . Let us 
show that the case 2x > y/2, x 2 < 2/3/3 is impossible under (I7.12p . In fact, we 
have 

y/l^W) > 4>{P), 0<P<l-r (7.15) 

Combining (I7.15P and (17.121) we find x > <fi({3). It is easy to see that x > <f>(P) and 
x 2 < 2(3/3 only if p < 3/4. This implies 2x < y/2. 

The relation x < to < 2x is equivalent to 2(3/3 < x 2 < 2(3. If to < y/2, 
then s* > s(\/2) = —7/2 + 1/2 — t^/A, which implies the second relation (I7.14p . 
Let us show that the case t > y/2, 2(3/3 < x 2 is impossible if x* > 4>{(3*). In 
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fact, it is easy to check that these inequalities are simultaneously satisfied only if 
x<<h(P), P> 3/4- However, (1712]) and fl7J5|) imply 

x > (f>(j3) = fa(j3), for (3 > 3/4, 

a contradiction. By comparing (17.141) with (I6.23j) and repeating the argument from 
the end of Subsection 16.21 we see that (17.121) implies liminf s* > 0. Since s(-) is a 
Lipschitz function, we can replace the maximum over the interval [0, y/2] by the 
maximum over our grid with step S, inducing the error of order 0(5). This yields 
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